Multidimensional Triangulation and Interpolation for Reinforcement Learning
نویسنده
چکیده
Dynamic Programming, Q-Iearning and other discrete Markov Decision Process solvers can be -applied to continuous d-dimensional state-spaces by quantizing the state space into an array of boxes. This is often problematic above two dimensions: a coarse quantization can lead to poor policies, and fine quantization is too expensive. Possible solutions are variable-resolution discretization, or function approximation by neural nets. A third option, which has been little studied in the reinforcement learning literature, is interpolation on a coarse grid. In this paper we study interpolation techniques that can result in vast improvements in the online behavior of the resulting control systems: multilinear interpolation, and an interpolation algorithm based on an interesting regular triangulation of d-dimensional space. We adapt these interpolators under three reinforcement learning paradigms: (i) offline value iteration with a known model, (ii) Q-Iearning, and (iii) online value iteration with a previously unknown model learned from data. We describe empirical results, and the resulting implications for practical learning of continuous non-linear dynamic control. 1 GRID-BASED INTERPOLATION TECHNIQUES Reinforcement learning algorithms generate functions that map states to "cost-t<r go" values. When dealing with continuous state spaces these functions must be approximated. The following approximators are frequently used: • Fine grids may be used in one or two dimensions. Above two dimensions, fine grids are too expensive. Value functions can be discontinuous, which (as we will see) can lead to su boptimalities even with very fine discretization in two dimensions . • Neural nets have been used in conjunction with TD [Sutton, 1988] and Q-Iearning [Watkins, 1989] in very high dimensional spaces [Tesauro, 1991, Crites and Barto, 1996]. While promising, it is not always clear that they produce the accurate value functions that might be needed for fine nearoptimal control of dynamic systems, and the most commonly used methods of applying value iteration or policy iteration with a neural-net value function are often unstable. [Boyan and Moore, 1995].
منابع مشابه
Convergent Reinforcement Learning with Value Function Interpolation
We consider the convergence of a class of reinforcement learning algorithms combined with value function interpolation methods using the methods developed in (Littman & Szepesvári, 1996). As a special case of the obtained general results, for the first time, we prove the (almost sure) convergence of Qlearning when combined with value function interpolation in uncountable spaces.
متن کاملLearning Qualitative Models through Partial Derivatives by Padé
Padé is a new method for learning qualitative models from observation data by computing partial derivatives from the data. Padé estimates partial derivatives of a target function from the learning data by splitting the attribute space into triangles or stars from Delaunay triangulation, or into tubes, and computing the linear interpolation or regression within these regions. Generalization is t...
متن کاملReinforcement Algorithms Using Functional Approximation for Generalization and their Application to Cart Centering and Fractal Compression
We address the conflict between identification and control or alternatively, the conflict between exploration and exploitation, within the framework of reinforcement learning. Qlearning has recently become a popular offpolicy reinforcement learning method. The conflict between exploration and exploitation slows down Q-learning algorithms; their performance does not scale up and degrades rapidly...
متن کاملBarycentric Approximator for Reinforcement Learning Control
Recently, various experiments to apply reinforcement learning method to the self-learning intelligent control of continuous dynamic system have been reported in the machine learning related research community. The reports have produced mixed results of some successes and some failures, and show that the success of reinforcement learning method in application to the intelligent control of contin...
متن کاملConvergent Combinations of Reinforcement Learning with Linear Function Approximation
Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. However, in practical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. Our main theorem yields sufficient ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996